{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "In this notebook we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare Environment\n",
    "\n",
    "Before you start with APIs delivered by BigDL-Nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to setup your environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Cifar10 Dataset\n",
    "\n",
    "Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.\n",
    "\n",
    "Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import DataLoader, Subset\n",
    "\n",
    "from bigdl.nano.pytorch.vision import transforms\n",
    "from bigdl.nano.pytorch.vision.datasets import CIFAR10\n",
    "\n",
    "def create_dataloader(data_path, batch_size):\n",
    "    train_transform = transforms.Compose([\n",
    "        transforms.Resize(256),\n",
    "        transforms.ColorJitter(),\n",
    "        transforms.RandomCrop(224),\n",
    "        transforms.RandomHorizontalFlip(),\n",
    "        transforms.Resize(128),\n",
    "        transforms.ToTensor()\n",
    "    ])\n",
    "\n",
    "    full_dataset = CIFAR10(root=data_path, train=True,\n",
    "                           download=True, transform=train_transform)\n",
    "\n",
    "    # use a subset of full dataset to shorten the training time\n",
    "    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))\n",
    "\n",
    "    train_loader = DataLoader(train_dataset, batch_size=batch_size,\n",
    "                              shuffle=True, num_workers=0)\n",
    "\n",
    "    return train_loader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom Model\n",
    "\n",
    "We use the Resnet18 module but add a Linear layer to change its output size to 10, because the CIFAR10 dataset has 10 classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch import nn\n",
    "\n",
    "from bigdl.nano.pytorch.vision.models import vision\n",
    "\n",
    "class ResNet18(nn.Module):\n",
    "    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):\n",
    "        super().__init__()\n",
    "        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)\n",
    "        output_size = backbone.get_output_size()\n",
    "        head = nn.Linear(output_size, num_classes)\n",
    "        self.model = nn.Sequential(backbone, head)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return self.model(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Train Loop\n",
    "\n",
    "Suppose the custom train loop is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "avg_loss: 3.538733959197998\n",
      "avg_loss: 3.2777256965637207\n",
      "avg_loss: 3.0464797019958496\n",
      "avg_loss: 2.5146260261535645\n",
      "avg_loss: 2.3445262908935547\n",
      "avg_loss: 2.301717519760132\n",
      "avg_loss: 2.1688358783721924\n",
      "avg_loss: 2.1054847240448\n",
      "avg_loss: 2.1210415363311768\n",
      "avg_loss: 2.1197376251220703\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import torch\n",
    "\n",
    "data_path = os.environ.get(\"DATA_PATH\", \".\")\n",
    "batch_size = 256\n",
    "max_epochs = 10\n",
    "lr = 0.01\n",
    "\n",
    "model = ResNet18(10, pretrained=False, include_top=False, freeze=True)\n",
    "loss_func = nn.CrossEntropyLoss()\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=lr)\n",
    "train_loader = create_dataloader(data_path, batch_size)\n",
    "\n",
    "model.train()\n",
    "\n",
    "for _i in range(max_epochs):\n",
    "    total_loss, num = 0, 0\n",
    "    for X, y in train_loader:\n",
    "        optimizer.zero_grad()\n",
    "        loss = loss_func(model(X), y)\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "        \n",
    "        total_loss += loss.sum()\n",
    "        num += 1\n",
    "    print(f'avg_loss: {total_loss / num}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.\n",
    "\n",
    "We only need the following steps:\n",
    "\n",
    "- define a class `MyNano` derived from our `TorchNano`\n",
    "- copy all lines of code into the `train` method of `MyNano`\n",
    "- add one line to setup model, optimizer and dataloader\n",
    "- replace the `loss.backward()` with `self.backward(loss)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import torch\n",
    "\n",
    "from bigdl.nano.pytorch import TorchNano\n",
    "\n",
    "class MyNano(TorchNano):\n",
    "    def train(self):\n",
    "        # copy all lines of code into this method\n",
    "        data_path = os.environ.get(\"DATA_PATH\", \".\")\n",
    "        batch_size = 256\n",
    "        max_epochs = 10\n",
    "        lr = 0.01\n",
    "\n",
    "        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)\n",
    "        loss_func = nn.CrossEntropyLoss()\n",
    "        optimizer = torch.optim.Adam(model.parameters(), lr=lr)\n",
    "        train_loader = create_dataloader(data_path, batch_size)\n",
    "\n",
    "        # add this line to setup model, optimizer and dataloaders\n",
    "        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)\n",
    "\n",
    "        model.train()\n",
    "\n",
    "        for _i in range(max_epochs):\n",
    "            total_loss, num = 0, 0\n",
    "            for X, y in train_loader:\n",
    "                optimizer.zero_grad()\n",
    "                loss = loss_func(model(X), y)\n",
    "                self.backward(loss)\n",
    "                optimizer.step()\n",
    "                \n",
    "                total_loss += loss.sum()\n",
    "                num += 1\n",
    "            print(f'avg_loss: {total_loss / num}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train in Non-distributed Mode\n",
    "\n",
    "To run the train loop, we only need to create an instance of `MyNano` and call its `train` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "avg_loss: 3.8871588706970215\n",
      "avg_loss: 4.426192283630371\n",
      "avg_loss: 3.148921251296997\n",
      "avg_loss: 2.879124641418457\n",
      "avg_loss: 2.5443203449249268\n",
      "avg_loss: 2.3415424823760986\n",
      "avg_loss: 2.2631752490997314\n",
      "avg_loss: 2.1276562213897705\n",
      "avg_loss: 2.108708143234253\n",
      "avg_loss: 2.091210126876831\n"
     ]
    }
   ],
   "source": [
    "MyNano().train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Intel Extension for Pytorch (a.k.a [IPEX](https://github.com/intel/intel-extension-for-pytorch)) extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `TorchNano`, you can turn on IPEX optimization by setting `use_ipex=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/root/miniconda3/envs/nano-dev/lib/python3.7/site-packages/intel_extension_for_pytorch/optim/_optimizer_utils.py:207: UserWarning: Does not suport fused step for <class 'torch.optim.adam.Adam'>, will use non-fused step\n",
      "  warnings.warn(\"Does not suport fused step for \" + str(type(optimizer)) + \", will use non-fused step\")\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "avg_loss: 3.9080657958984375\n",
      "avg_loss: 3.6406493186950684\n",
      "avg_loss: 3.0531580448150635\n",
      "avg_loss: 2.6435179710388184\n",
      "avg_loss: 2.420058488845825\n",
      "avg_loss: 2.249678134918213\n",
      "avg_loss: 2.1675872802734375\n",
      "avg_loss: 2.126716136932373\n",
      "avg_loss: 2.095839500427246\n",
      "avg_loss: 2.0563669204711914\n"
     ]
    }
   ],
   "source": [
    "MyNano(use_ipex=True).train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train in Distributed Mode\n",
    "\n",
    "You can set the number of processes to enable distributed training to acclerate training. You can also set different distributed backends, now BigDL-Nano supports `spawn`, `subprocess` and `ray`, the default backend is `subprocess`.\n",
    "\n",
    "- Note: only the `subprocess` backend can be used in interactive environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2\n",
      "Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2\n",
      "----------------------------------------------------------------------------------------------------\n",
      "distributed_backend=gloo\n",
      "All distributed processes registered. Starting with 2 processes\n",
      "----------------------------------------------------------------------------------------------------\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "Files already downloaded and verified\n",
      "avg_loss: 4.357134819030762\n",
      "avg_loss: 4.236389636993408\n",
      "avg_loss: 3.013183832168579\n",
      "avg_loss: 3.0220577716827393\n",
      "avg_loss: 3.2535336017608643\n",
      "avg_loss: 3.0781147480010986\n",
      "avg_loss: 3.075800657272339\n",
      "avg_loss: 3.1724421977996826\n",
      "avg_loss: 3.1078386306762695\n",
      "avg_loss: 2.918447494506836\n",
      "avg_loss: 2.6689560413360596\n",
      "avg_loss: 2.783597230911255\n",
      "avg_loss: 2.3848423957824707avg_loss: 2.309765100479126\n",
      "\n",
      "avg_loss: 2.3010752201080322\n",
      "avg_loss: 2.279109001159668\n",
      "avg_loss: 2.2204744815826416avg_loss: 2.2609469890594482\n",
      "\n",
      "avg_loss: 2.1395022869110107\n",
      "avg_loss: 2.211986541748047\n"
     ]
    }
   ],
   "source": [
    "MyNano(num_processes=2, distributed_backend=\"subprocess\").train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course you can enable both distributed training and IPEX"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2\n",
      "Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2\n",
      "----------------------------------------------------------------------------------------------------\n",
      "distributed_backend=gloo\n",
      "All distributed processes registered. Starting with 2 processes\n",
      "----------------------------------------------------------------------------------------------------\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verifiedFiles already downloaded and verified\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/root/miniconda3/envs/nano-dev/lib/python3.7/site-packages/intel_extension_for_pytorch/optim/_optimizer_utils.py:207: UserWarning: Does not suport fused step for <class 'torch.optim.adam.Adam'>, will use non-fused step\n",
      "  warnings.warn(\"Does not suport fused step for \" + str(type(optimizer)) + \", will use non-fused step\")\n",
      "/root/miniconda3/envs/nano-dev/lib/python3.7/site-packages/intel_extension_for_pytorch/optim/_optimizer_utils.py:207: UserWarning: Does not suport fused step for <class 'torch.optim.adam.Adam'>, will use non-fused step\n",
      "  warnings.warn(\"Does not suport fused step for \" + str(type(optimizer)) + \", will use non-fused step\")\n",
      "2022-07-27 15:45:26,008 - root - INFO - Reducer buckets have been rebuilt in this iteration.\n",
      "2022-07-27 15:45:26,009 - root - INFO - Reducer buckets have been rebuilt in this iteration.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "avg_loss: 3.8353183269500732\n",
      "avg_loss: 4.010594844818115\n",
      "avg_loss: 3.5082931518554688\n",
      "avg_loss: 3.6064693927764893\n",
      "avg_loss: 3.8697359561920166\n",
      "avg_loss: 3.9609947204589844\n",
      "avg_loss: 3.309493064880371\n",
      "avg_loss: 3.2898263931274414\n",
      "avg_loss: 2.688565969467163\n",
      "avg_loss: 2.639798879623413\n",
      "avg_loss: 2.828411340713501\n",
      "avg_loss: 2.84289288520813\n",
      "avg_loss: 2.4235198497772217\n",
      "avg_loss: 2.4322192668914795\n",
      "avg_loss: 2.3998563289642334\n",
      "avg_loss: 2.399547576904297\n",
      "avg_loss: 2.259463310241699\n",
      "avg_loss: 2.217374563217163\n",
      "avg_loss: 2.2705495357513428avg_loss: 2.2279140949249268\n",
      "\n"
     ]
    }
   ],
   "source": [
    "MyNano(use_ipex=True, num_processes=2, distributed_backend=\"subprocess\").train()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.13 ('nano-dev': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "dda1d5f709f7f060022bc27c348a281835c405e1e2acbb42e3d907d6ed3046bf"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
